Spark: a navigational paradigm for genomic data exploration.
نویسندگان
چکیده
Biologists possess the detailed knowledge critical for extracting biological insight from genome-wide data resources, and yet they are increasingly faced with nontrivial computational analysis challenges posed by genome-scale methodologies. To lower this computational barrier, particularly in the early data exploration phases, we have developed an interactive pattern discovery and visualization approach, Spark, designed with epigenomic data in mind. Here we demonstrate Spark's ability to reveal both known and novel epigenetic signatures, including a previously unappreciated binding association between the YY1 transcription factor and the corepressor CTBP2 in human embryonic stem cells.
منابع مشابه
Surfing the city: An architecture for context aware urban exploration
Web surfing, the act of following links of interest with no pre-defined search goal, is a paradigm that can be translated to the physical realm of urban exploration. With mobile computing technology and its supporting infrastructure becoming ever more ubiquitous, a user's digital device can be transformed into a portal that connects their physical environment with the virtual, providing instant...
متن کاملCreating a Portable, High-Level Graph Analytics Paradigm For Compute and Data-Intensive Applications
HPC offers tremendous potential to process large amount of data commonly referred to as ‘Big Data’. Due to the immense computational requirements of Big Data applications, the HPC and Big Data communities are converging. As a result, heterogeneous and distributed systems are becoming commonplace. In order to take advantage of the immense computing power of these systems, distributing data effic...
متن کاملVariantSpark: Applying Spark-based machine learning methods to genomic information
Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a framework for applying machine learning algorithms in MLlib to genomic variant data using the efficient in-memory Spark compute engine. We demonstr...
متن کاملImmersive graph-based visualization and exploration of biological data relationships
Genomic information shows some characteristics that make them very difficult to interpret and to exploit. Such data constitute an important factual resource (GenBank, SwissProt, GeneOntology, or Decrypthon...), are heterogeneous, huge in quantity, and are geographically distributed. They are also recorded in structured or semi-structured formats within public or private databanks. Nevertheless,...
متن کاملhMDAP: A Hybrid Framework for Multi-paradigm Data Analytical Processing on Spark
We propose hMDAP, a hybrid framework for large-scale data analytical processing on Spark, to support multi-paradigm process (incl. OLAP, machine learning, and graph analysis etc.) in distributed environments. The framework features a three-layer data process module and a business process module which controls the former. We will demonstrate the strength of hMDAP by using traffic scenarios in a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genome research
دوره 22 11 شماره
صفحات -
تاریخ انتشار 2012